Search CORE

Predicting mutually exclusive spliced exons based on exon length, splice site and reading frame conservation, and exon sequence homology

Author: Hammesfahr Björn
Hatje Klas
Kollmar Martin
Odronitz Florian
Pillmann Holger
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Alternative splicing of pre-mature RNA is an important process eukaryotes utilize to increase their repertoire of different protein products. Several types of different alternative splice forms exist including exon skipping, differential splicing of exons at their 3'- or 5'-end, intron retention, and mutually exclusive splicing. The latter term is used for clusters of internal exons that are spliced in a mutually exclusive manner. Results We have implemented an extension to the WebScipio software to search for mutually exclusive exons. Here, the search is based on the precondition that mutually exclusive exons encode regions of the same structural part of the protein product. This precondition provides restrictions to the search for candidate exons concerning their length, splice site conservation and reading frame preservation, and overall homology. Mutually exclusive exons that are not homologous and not of about the same length will not be found. Using the new algorithm, mutually exclusive exons in several example genes, a dynein heavy chain, a muscle myosin heavy chain, and Dscam were correctly identified. In addition, the algorithm was applied to the whole <it>Drosophila melanogaster </it>X chromosome and the results were compared to the Flybase annotation and an <it>ab initio </it>prediction. Clusters of mutually exclusive exons might be subsequent to each other and might encode dozens of exons. Conclusions This is the first implementation of an automatic search for mutually exclusive exons in eukaryotes. Exons are predicted and reconstructed in the same run providing the complete gene structure for the protein query of interest. WebScipio offers high quality gene structure figures with the clusters of mutually exclusive exons colour-coded, and several analysis tools for further manual inspection. The genome scale analysis of all genes of the <it>Drosophila melanogaster </it>X chromosome showed that WebScipio is able to find all but two of the 28 annotated mutually exclusive spliced exons and predicts 39 new candidate exons. Thus, WebScipio should be able to identify mutually exclusive spliced exons in any query sequence from any species with a very high probability. WebScipio is freely available to academics at <url>http://www.webscipio.org</url>.</p

GenePainter: a fast tool for aligning gene structures of eukaryotic protein families, visualizing the alignments and mapping gene structures onto protein structures

Author: Hammesfahr Björn
Kollmar Martin
Mühlhausen Stefanie
Odronitz Florian
Waack Stephan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: All sequenced eukaryotic genomes have been shown to possess at least a few introns. This includes those unicellular organisms, which were previously suspected to be intron-less. Therefore, gene splicing must have been present at least in the last common ancestor of the eukaryotes. To explain the evolution of introns, basically two mutually exclusive concepts have been developed. The introns-early hypothesis says that already the very first protein-coding genes contained introns while the introns-late concept asserts that eukaryotic genes gained introns only after the emergence of the eukaryotic lineage. A very important aspect in this respect is the conservation of intron positions within homologous genes of different taxa. Results: GenePainter is a standalone application for mapping gene structure information onto protein multiple sequence alignments. Based on the multiple sequence alignments the gene structures are aligned down to single nucleotides. GenePainter accounts for variable lengths in exons and introns, respects split codons at intron junctions and is able to handle sequencing and assembly errors, which are possible reasons for frame-shifts in exons and gaps in genome assemblies. Thus, even gene structures of considerably divergent proteins can properly be compared, as it is needed in phylogenetic analyses. Conserved intron positions can also be mapped to user-provided protein structures. For their visualization GenePainter provides scripts for the molecular graphics system PyMol

OPUS

diArk 2.0 provides detailed analyses of the ever increasing eukaryotic genome sequencing data

Author: AV Zimin
Björn Hammesfahr
CG Elsik
D Weigel
DA Wheeler
E Pennisi
ES Lander
EW Sayers
F Odronitz
FD Guerrero
Florian Odronitz
G Liti
J Heer
J Xu
JC Venter
JD McPherson
K Liolios
M Bostock
Marcel Hellkamp
Martin Kollmar
ML Metzker
N Goto
ND Mendes
NK Petty
Q Xia
R Li
RM Durbin
S Diguistini
S Richards
S Tangphatsornruang
TJ Sharpton
X Qin
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Nowadays, the sequencing of even the largest mammalian genomes has become a question of days with current next-generation sequencing methods. It comes as no surprise that dozens of genome assemblies are released per months now. Since the number of next-generation sequencing machines increases worldwide and new major sequencing plans are announced, a further increase in the speed of releasing genome assemblies is expected. Thus it becomes increasingly important to get an overview as well as detailed information about available sequenced genomes. The different sequencing and assembly methods have specific characteristics that need to be known to evaluate the various genome assemblies before performing subsequent analyses. Results diArk has been developed to provide fast and easy access to all sequenced eukaryotic genomes worldwide. Currently, diArk 2.0 contains information about more than 880 species and more than 2350 genome assembly files. Many meta-data like sequencing and read-assembly methods, sequencing coverage, GC-content, extended lists of alternatively used scientific names and common species names, and various kinds of statistics are provided. To intuitively approach the data the web interface makes extensive usage of modern web techniques. A number of search modules and result views facilitate finding and judging the data of interest. Subscribing to the RSS feed is the easiest way to stay up-to-date with the latest genome data. Conclusions diArk 2.0 is the most up-to-date database of sequenced eukaryotic genomes compared to databases like GOLD, NCBI Genome, NHGRI, and ISC. It is different in that only those projects are stored for which genome assembly data or considerable amounts of cDNA data are available. Projects in planning stage or in the process of being sequenced are not included. The user can easily search through the provided data and directly access the genome assembly files of the sequenced genome of interest. diArk 2.0 is available at <url>http://www.diark.org</url>.</p

Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio

Author: AA Salamov
AG Clark
Björn Hammesfahr
BM Tyler
C Burge
C Wei
E Birney
E Picardi
E van Nimwegen
ER Mardis
F Odronitz
F Odronitz
F Odronitz
F Odronitz
G Butler
GS Slater
Holger Pillmann
Klas Hatje
M Deutsch
M Srivastava
M Stanke
M Stanke
Martin Kollmar
MJ Benton
MJ Gardner
N Goto
O Keller
Oliver Keller
RF Yeh
SE Prochnik
SF Altschul
SJ Yoon
Stephan Waack
SW Roy
V Solovyev
VN Babenko
WJ Kent
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Obtaining transcripts of homologs of closely related organisms and retrieving the reconstructed exon-intron patterns of the genes is a very important process during the analysis of the evolution of a protein family and the comparative analysis of the exon-intron structure of a certain gene from different species. Due to the ever-increasing speed of genome sequencing, the gap to genome annotation is growing. Thus, tools for the correct prediction and reconstruction of genes in related organisms become more and more important. The tool Scipio, which can also be used via the graphical interface WebScipio, performs significant hit processing of the output of the Blat program to account for sequencing errors, missing sequence, and fragmented genome assemblies. However, Scipio has so far been limited to high sequence similarity and unable to reconstruct short exons. Results Scipio and WebScipio have fundamentally been extended to better reconstruct very short exons and intron splice sites and to be better suited for cross-species gene structure predictions. The Needleman-Wunsch algorithm has been implemented for the search for short parts of the query sequence that were not recognized by Blat. Those regions might either be short exons, divergent sequence at intron splice sites, or very divergent exons. We have shown the benefit and use of new parameters with several protein examples from completely different protein families in searches against species from several kingdoms of the eukaryotes. The performance of the new Scipio version has been tested in comparison with several similar tools. Conclusions With the new version of Scipio very short exons, terminal and internal, of even just one amino acid can correctly be reconstructed. Scipio is also able to correctly predict almost all genes in cross-species searches even if the ancestors of the species separated more than 100 Myr ago and if the protein sequence identity is below 80%. For our test cases Scipio outperforms all other software tested. WebScipio has been restructured and provides easy access to the genome assemblies of about 640 eukaryotic species. Scipio and WebScipio are freely accessible at <url>http://www.webscipio.org</url>.</p

A holistic phylogeny of the coronin gene family reveals an ancient origin of the tandem-coronin, defines a new subfamily, and predicts protein function

Author: Eckert Christian
Hammesfahr Björn
Kollmar Martin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2011
Field of study

Abstract Background Coronins belong to the superfamily of the eukaryotic-specific WD40-repeat proteins and play a role in several actin-dependent processes like cytokinesis, cell motility, phagocytosis, and vesicular trafficking. Two major types of coronins are known: First, the short coronins consisting of an N-terminal coronin domain, a unique region and a short coiled-coil region, and secondly the tandem coronins comprising two coronin domains. Results 723 coronin proteins from 358 species have been identified by analyzing the whole-genome assemblies of all available sequenced eukaryotes (March 2011). The organisms analyzed represent most eukaryotic kingdoms but also cover every taxon several times to provide a better statistical sampling. The phylogenetic tree of the coronin domains based on the Bayesian method is in accordance with the most recent grouping of the major kingdoms of the eukaryotes and also with the grouping of more recently separated branches. Based on this "holistic" approach the coronins group into four classes: class-1 (Type I) and class-2 (Type II) are metazoan/choanoflagellate specific classes, class-3 contains the tandem-coronins (Type III), and the new class-4 represents the coronins fused to villin (Type IV). Short coronins from non-metazoans are equally related to class-1 and class-2 coronins and thus remain unclassified. Conclusions The coronin class distribution suggests that the last common eukaryotic ancestor possessed a single and a tandem-coronin, and most probably a class-4 coronin of which homologs have been identified in Excavata and Opisthokonts although most of these species subsequently lost the class-4 homolog. The most ancient short coronin already contained the trimerization motif in the coiled-coil domain.</p

Peakr: simulating solid-state NMR spectra of proteins.

Author: Hammesfahr Björn
Hellkamp Marcel
Kollmar Martin
Odronitz Florian
Schneider Robert
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/05/2013
Field of study

International audienceWhen analyzing solid-state nuclear magnetic resonance (NMR) spectra of proteins, assignment of resonances to nuclei and derivation of restraints for 3D structure calculations are challenging and time-consuming processes. Simulated spectra that have been calculated based on, for example, chemical shift predictions and structural models can be of considerable help. Existing solutions are typically limited in the type of experiment they can consider and difficult to adapt to different settings. Here, we present Peakr, a software to simulate solid-state NMR spectra of proteins. It can generate simulated spectra based on numerous common types of internuclear correlations relevant for assignment and structure elucidation, can compare simulated and experimental spectra and produces lists and visualizations useful for analyzing measured spectra. Compared with other solutions, it is fast, versatile and user friendly. Peakr is maintained under the GPL license and can be accessed at http://www.peakr.org. The source code can be obtained on request from the authors

HAL-CEA

diArk – the database for eukaryotic genome and transcriptome assemblies in 2014

Author: 3 000 rice genomes project
Aurrecoechea
Basu
Björn Hammesfahr
Bradnam
Daetwyler
Dominic Simm
Drysdale
Gallant
Genome 10K Community of Scientists
Goodstein
Haas
Hammesfahr
Hatje
Hatje
i5K Consortium
Keeling
Kumar
Lotte Kollmar
Martin Kollmar
Megy
Nanda
Pagani
Sneddon
Soria-Carrasco
Stajich
Weigel
Yook
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study